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Abstract — This paper studies the convergence rate of 
a continuous-time dynamical system for i\ -minimization, 
known as the Locally Competitive Algorithm (LCA). Solving 
l\ -minimization problems efficiently and rapidly is of great 
interest to the signal processing community, as these programs 
have been shown to recover sparse solutions to underdetermined 
systems of linear equations and come with strong performance 
guarantees. The LCA under study differs from the typical 
i\ solver in that it operates in continuous time: instead of 
being specified by discrete iterations, it evolves according to a 
system of nonlinear ordinary differential equations. The LCA is 
constructed from simple components, giving it the potential to 
be implemented as a large-scale analog circuit. 

The goal of this paper is to give guarantees on the convergence 
time of the LCA system. To do so, we analyze how the LCA 
evolves as it is recovering a sparse signal from underdetermined 
measurements. We show that under appropriate conditions on the 
measurement matrix and the problem parameters, the path the 
LCA follows can be described as a sequence of linear differential 
equations, each with a small number of active variables. This 
allows us to relate the convergence time of the system to the 
restricted isometry constant of the matrix. Interesting parallels 
to sparse-recovery digital solvers emerge from this study. Our 
analysis covers both the noisy and noiseless settings and is 
supported by simulation results. 

Index Terms — Locally Competitive Algorithm, sparse 
approximation, Compressed Sensing, dynamical systems, 
ti -minimization 

I. Introduction 

COMPRESSED Sensing (CS) has triggered extensive re- 
search because of compelling results on the reconstruc- 
tion of sparse signals (i.e. signals with few non-zero elements) 
from highly-undersampled linear measurements. The main 
results of CS show that coded measurements can be used 
to simultaneously acquire and compress a signal, requiring 
many fewer resources (e.g., time, storage, etc.) than traditional 
sampling approaches. However, the process of reconstructing 
the original signal from its compressed measurements requires 
a significant amount of computation and remains a bottleneck 
in the processing pipeline. 

The approach to signal reconstruction that has been most 
extensively studied involves solving an optimization program 
that minimizes a combination of a mean-squared error term 
and a sparsity-inducing term (typically measured using the 
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^i-norm). Specifically, given a set of (possibly noisy) mea- 
surements y € M M of a signal € R N through a M x N 
matrix <£>, we estimate a' by solving 

, 1 2 

a T =argmin-|jy-$a|| 2 + A||a|| 1! (1) 

a ^ 

where \\a\\-, = J^i \ a i\- The ^-norm is used as a surrogate for 
the ideal counting pseudo-norm ||<z|| , which counts the num- 
ber of non-zero elements ||a|) . Under particular conditions 
on $, it can be shown that the performance of the relaxed 
program ([TJi is comparable to the idealized (but generally 
intractable (TJ) sparse approximation problem. Despite the 
optimization in ([TJ) being a convex and tractable program with 
many specialized solvers (e.g., |2|-[8|), the required compu- 
tation in most problem sizes of interest makes it prohibitive 
to perform CS reconstruction in real time or on low-power 
embedded platforms. 

The Locally Competitive Algorithm (LCA) [9] (illustrated 
in Fig. [TJ is a continuous-time system of coupled nonlinear 
differential equations that settles to the minimizer of ([TJ) in 
steady state flO) . The LCA architecture consists of simple 
components (matrix-vector operations and a pointwise nonlin- 
earity for thresholding), giving it the potential to be imple- 
mented in an analog circuit JTTJ, fl2| . Analog networks for 
solving optimization problems have a long history, dating back 
to Hopfield's pioneering results for linear programming fT3) (a 
comprehensive treatment of the subject can be found in [14|). 
Such analog systems can potentially have significant speed and 
power advantages over their digital counterparts. 

While analog implementations of the LCA have the po- 
tential to alleviate the bottleneck of CS signal reconstruction 
in some scenarios, as with any signal processing system, it 
is important to have strong performance guarantees before 
deploying the system in an application. Prior work |10| has 
studied the convergence behavior of the LCA in a general 
setting (with no assumption on the signal or the matrix) 
to prove that the system has global asymptotic convergence 
to the correct solution. In this general setting, the LCA is 
shown to converge exponentially fast, provided some condition 
that depends on the solution path of the system (i.e., which 
nodes cross threshold to become active during convergence). 
More specifically, as the nodes evolve, the LCA dynamics 
switch between sets of linear ordinary differential equations 
that involve submatrices of <&. If each submatrix is well- 
conditioned, then the exponential convergence result follows. 
The main contribution of this paper is to study the specific 
case of sparse recovery. Interestingly, our analysis depends 
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on the well-known Restricted Isometry Property (RIP) for 
CS measurement matrices. This condition ensures that every 
submatrix of a specific size is well-conditioned. The results in 
this paper establish conditions on problem parameters (such 
as signal sparsity S, ambient dimension N, and number 
of measurements M) that guarantee that the size of each 
submatrix is indeed small. These guarantees can then be used 
to provide strong bounds on the convergence speed of the 
system. Our resulting conditions are naturally analogous to 
existing bounds for traditional digital algorithms. 

Specifically, after reviewing the guarantees for existing 
digital algorithms and prior analysis of the LCA in Sections III 
we present our first main result (Theorem [2j in Section |III-A 



This theorem establishes conditions ensuring that only nodes 
that are part of the support on the original signal at become 
active during convergence. We will see that when applied 
to a class of random matrices, these conditions lead to a 
number of measurements on the order of M = 0(S 2 log TV), 
matching results for iterative digital algorithms (including 
Orthogonal Matching Pursuit fT5) , [ fT6| , and homotopy -based 
methods [2]) to take exactly S steps in their solution. Our 
second main result (Theorem [3J, presented in Section III-B 



relaxes the bounds above to allow a constant number of 
nodes to enter the support set during convergence. These 
conditions lead to a number of measurements on the order of 
M = 0(S log N) for random matrices, which is qualitatively 
optimal for ^-minimization to recover sparse signals from 
compressive measurements. Both of these results allow us to 
establish a bound on the exponential rate of convergence for 
the LCA by relating it to the restricted isometry constant of 
the matrix $. The qualitative predictions of these theoretical 



guarantees are explored in simulation in Section IV 



II. Background and Related work 

The analysis in this paper differs from previous studies that 
appear in the CS literature because of the continuous nature of 
the LCA algorithm. In particular, the complexity of the LCA 
cannot be expressed in terms of a number of "iterations" as is 
often done for digital algorithms. Nevertheless, some analogies 
to previous work can be drawn. In this section, we describe 
some existing approaches to sparse recovery, their associated 
guarantees, and how they relate to the LCA. 

A. l\ minimization 

The use of ^-minimization (5J to recover a signal from 
compressed measurements has been extensively studied and 
was shown to lead to state-of-the-art results in several cases 
of interest. To present the performance guarantees associated 
with minimizing ([TJ, we introduce a desirable property of 
the measurement matrix $, known as the Restricted Isometry 
Property (RIP). This property guarantees that every submatrix 
formed from a small subset of columns of $ is a near isometry. 

Definition 1: The matrix $ satisfies the RIP of order k if 
there exist a constant S e (0, 1), such that for any vector 
x £ such that ||x|| < k, we have: 

(l-5)\\x\\l<\\$x\\l<(l + 6)\\x\\l 



We also say that $ satisfies the RIP with parameters (k, S). 
The RIP-constant Sk of order k for $ is defined as the smallest 
positive constant S satisfying (T5J. 

If the measurement matrix $ satisfies the RIP, then l\- 
minimization is known to provide uniform and stable recovery. 
The former means that one choice of a measurement matrix 
$ can recover every sparse signal. The later guarantees that 
the recovery error scales nicely with the noise level |17|. 
The sharpest result for recovery with £i -minimization requires 
that the measurement matrix $ satisfy the Restricted Isometry 
Property with parameters (2S, y/2—1) [18]. Note that since the 
LCA provably minimizes ([TJ, the recovery guarantees stated 
above all apply to using the LCA for CS recovery. 

The theorems in this paper are stated in terms of the RIP 
constant for a fixed matrix $. However, to interpret the results, 
we will make use of some known expression established for 
random matrices. Special cases of interest include random 
matrices with independent and identically distributed Bernoulli 
columns with unit norm, or columns drawn independently and 
uniformly at random from the unit sphere. Both cases yield 
the following result (see Theorem 5.65 in p9)): If 4> is an 
M x N random matrix, whose columns are independent 
subgaussian random vectors in R M with ||$ ra || 2 = 1, then for 
any sparsity level 1 < S < N and any S € (0, 1), the matrix $ 
satisfies the RIP with parameters (S, S) with high probability 
provided that: 

From this result, we will use the following estimate for the 
RIP constant: 
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State-of-the art solvers for ([TJ (e.g., j4)-Q> |20|) can han- 
dle large scale problems, but lack strong guarantees about 
their running time. On the other hand, iterative thresholding 
schemes (e.g. JSJ, pT) ) are simple and come with guarantees 
about the number of iterations needed to achieve a certain 
accuracy. The LCA system dynamics resemble a continuous- 
time version of an iterative thresholding step. Another class 
of algorithms consists of homotopy-based schemes that trace 
a piecewise-linear solution path as the tradeoff parameter A 
is varied. If the solution is very sparse, these approaches 
can converge in exactly S iterations (known as the S-step 
property) {2J. The LCA solution path is very similar to 
the Homotopy [2| and its approximate version LARS |22) , 
because the system evolves according to a piecewise-linear 
dynamical system that changes each time a node crosses 
threshold. 



B. Greedy algorithms 

A second possible approach to sparse signal recovery is 
through the use of iterative greedy algorithms. A common ap- 
proach in this family is Orthogonal Matching Pursuit (OMP), 
which at each iteration adds to the support the element that 
has the strongest correlation with the residual. Recovery results 
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Fig. 1. Block diagram of the LCA. The internal state variables u are driven 
by the projection of the input onto each of the N dictionary elements. They 
produce the outputs a through the activation function T\(-), which are then 
weighted by the interconnection matrix and fedback. 



have been shown for OMP using 0(S log N) measurements 
in the noiseless [23 1 and noisy cases (24) , but these results are 
limited. In particular, it has been shown that it is impossible to 
get a recovery result that is both uniform (i.e., using a single 
draw of the random measurement matrix) and requiring only S 
iterations if only 0(S log N) measurements are available fl5) . 
Recent work has shown that OMP can recover an S'-sparse 
signal in exactly S iterations (i.e., having the .S-step property), 
but it requires 0(S 2 log N) noiseless measurements [16]. 

In contrast to OMP, Regularized Orthogonal Matching 
Pursuit (ROMP) (25) and Compressive Sampling Matching 
Pursuit (CoSamp) |26) add a set of nodes at each itera- 
tion. Both ROMP and CoSamp guarantee uniform and sta- 
ble recovery from only <D(S log N) measurements, but they 
require a slightly stricter RIP constant than necessary for 
^i-minimization. OMP, ROMP and CoSamp have an overall 
computational complexity of O(SMN) in general. 



C. The Locally Competitive Algorithm 

1) LCA structure and dynamics: The LCA can be viewed 
as system of nodes that evolve according to a set of coupled, 
nonlinear Ordinary Differential Equations (ODEs). Specifi- 
cally, the system has a set of internal state variables, u n {t) 
for n = 1,...,N, where each node is associated with a 
single dictionary element $„. Each node produces an output 
variable a n (i) for n = 1, . . . , N through a nonlinear pointwise 
activation function T\(-). The dynamics of the internal state 
variables are defined by: 

Tu{t) = -u(t) - ($ T $ - I) a(t) + $ T y 
a{t)=T x (u(t)) 



(4) 



The time constant r is determined by the physical properties 
of the solver implementing the system. The time constant r 
does not affect the mathematical analyses of the system, so we 
often take r = 1 except when we want to stress its influence 
on the convergence speed. We will assume throughout that the 
columns of <f> = . . . , 5>jv] have unit norm: ||$ n || 2 = L 
The LCA architecture is illustrated in Fig. [T] The activation 
function used to solve ^-minimization is the soft-thresholding 



function (9): 

a n (t) = T x {u n (t)) 



I < A 

,(t) - Aav.(*), K(i)|>A 



, (5) 



where z n (t) is the sign of the n th internal state variable, 
Zn{t) = sign(u n (t)). Though ^-minimization will be our 
focus, recent work has also shown that many other sparsity- 
inducing penalty functions can be minimized in the same 
system by changing the form of T\(-) (27). 

It can be seen from |5]) that the activation function is 
composed of two operating regions. When |tt n | < A, the output 
a n is zero and we call the node inactive. When \u n \ > A, 
the output a n is strictly increasing with u n and we call the 
node active. Denote by Y the current active set (i.e., the 
set of indices Y(t) = {k £ [l,N], \u k (t)\ > A}), and 
denote by Y c (t) the inactive set consisting of nodes that are 
below threshold. While the active set changes with time as 
the network evolves, for the sake of readability and when it 
is clear from the context, we omit the dependence on time in 
the notation and just write the active set as Y. The sequence 
of switching times for which the system moves from the set 
of active nodes Yf._i to Yf. is the sequence {£/c}{/cgn}- In the 
following, we will also denote by $7- the matrix composed of 
the columns of $ indexed by the set T, setting all the other 
entries to zero. Similarly, uj- and aj- refer to the elements in 
the original vectors indexed by T setting other entries to zero. 

The LCA is a type of switched linear system (28), where 
the dynamics are a linear system that changes every time a 
node crosses threshold (i.e., moves into or out of the active 
set). Between switching times, the active set Y is fixed and the 
ODE Q can be rewritten separately for nodes in the active 
and inactive sets: 

d r (*) = -$r$rar(*) + $r3/- Az r (*)< (6) 

ur-(*) = -u T c(t) - $£,$ror(t) + (7) 

Because only nodes in the active set produce feedback, the 
dynamics on Y are decoupled from the inactive set until the 
next switching time. 

On the active set Y, the solution to the linear ODE |6]l be- 
tween switching times t k and tk+i is given by (see Appendix 
0: 

(t) = e-^-^afi + (/ - e- A <*-*»>) A' 1 (^y - Xz r ) , 

(8) 

where A = <f>p<f>r and att — ar(ifc). As explained in 
Appendix [X] the term (/ — e~ A ( t ~ tk ^ A^ 1 is always well- 
defined, even if the matrix A is singular (similarly to the 
term (l — e _At ) /A that is well-defined even when A is 
zero). In the case where $p$r is non-singular, the term 



ap° = A -1 ($ r y — Azp) can be interpreted as the steady state 
of |6| if the active set and sign vector z-p remain unchanged 
until convergence. On the other hand, on the inactive set r c , 
the solution to the linear ODE |7| between switching times tk 
and tk+i is given by: 



U r e(t) 



e -{t-W u tk c + e -t l e "p rc (v)dv, (9) 



4 



where pv<={v) = (v — ^rar(")) and Wp fc c = ur c (£fc)- 
Letting t go to infinity in equations ([H} and (Bl, the fixed 
point a* supported on the final active set must satisfy: 

a f, = (*r»*r») 1 ($r m y - A^r.) 



Since a node j is in the inactive set T^ if and only if \uj 
the two equations above translate immediately to: 



't, 



(»-*r.of.) 



< A, 



< A, 



(10) 



which are the two well-known optimality conditions for a* to 
be the solution to {j} (29). 

2) LCA Convergence Speed: As with their digital coun- 
terparts, it is desirable to know how fast continuous-time 
systems such as the LCA converge. For a system described 
by a differential equation of the form 



F(x), x £ 



(11) 



we say that the dynamical system ( fTT| is exponentially con- 
vergent to the solution x* if there exists a constant c > such 
that for any initial point x(0), there exists a constant kq > 
(which may depend on x(Q)) for which the trajectory x{t) 
of the system satisfies \\x(t) - x*\\ < n Q e~ ct , Vt > 0. The 
constant c is referred to as convergence speed of the system. 

The LCA has been shown to be exponentially conver- 
gent [10], with a convergence speed that depends on the 
transient activity in the system. To state this result, define T* as 
the active set of the solution a* to (QJ, and define the constant 
d as the smallest positive constant such that for any active set 
r visited by the LCA and any vector x in Mr supported on 
r = rur„we have: 



panying each theorem is a discussion of the implications of 
the result and a comparison to existing results for the digital 
algorithms discussed in Section [El] Section III-C uses the 
theorems to establish strong bound on the convergence speed 
of the LCA in CS recovery. 

For all the results in this section, we consider the vector 
a) in R N to be the "true" underlying signal, or original 
signal, that has S non-zero coefficients supported on the set 
T|, referred to as the optimal support. This signal generates 
noisy measurements y in K M : 

y = $a t + e = $r t a f + e 

for some noise vector e £ R M . Our analysis considers the 
general case where the measurements are corrupted by noise, 
but remains valid in the noise-free case, when e = 0. 



A. Bounding the active set by the optimal support 

Our first result defines a relationship between the RIP 
constant 6 of order (S + 1), the sparsity level S of the 
vector aJ, and the threshold A that guarantees that nodes 
outside the optimal support T| never become active throughout 
convergence. We also define the following quantities that will 
appear several times in the proofs of the theorems: 

a = a s = (l + <5)(l-(5)- 2 , 



C s (p) =a(\\aM\ +Vl-S\\e\\ 2 + X^/p 



(l-d)\\x\\l<\\3>x\\l<(l + d)\\x\\l 



(12) 



constant 6 are satisfied: 



Although this definition looks similar to the definition of the 
RIP constant, it is important to note that ( fT2| ) must hold not 
for a general index set (as in |2|) but rather for the active 
sets visited by the LCA over all time t during convergence. If 11 " 

such a d exists, Theorem 3 of flO) applied to l\ -minimization ^1 — aSy/S^j A > a5 ( 
becomes: 

Theorem 1: Provided that d < 1, where d is defined in ([12}, 
the LCA system defined in |4]) converges exponentially fast 
with convergence speed (1 — d) /r, i.e. there exists a constant 
K. > 0, such that 



Note that a > 1. The dependence on the noise vector appears 
clearly so that the theorem can be specialized to the noise-free 
case in a straightforward manner. 

Theorem 2: Assume that the dictionary $ satisfies the RIP 
with parameters (S + 1,6) and that the support T(0) of the 
initial output states a(0) is a subset of the optimal support T+. 
If the following two conditions between the original signal 
the threshold A, the noise e, the sparsity S and the RIP 



it 



a(0)\ 
J\\ 



2 < C S (S), 
f VT=6\\e\\ 2 



1 t 



(13) 



(14) 



||u(t)-«*|| a < JCe- (1 - d ^/ T . 

While it is difficult to characterize the path of the LCA in 
general, if the size of the active sets visited during convergence 
can be bounded then d can be related to the RIP constant and 
Theorem [T] can be used to bound the convergence speed. This 
is precisely what is done in the two main results of this paper. 

III. Bounding LCA Active Set Sizes and 
Convergence Speed 

In this section, we state our two main theorems that bound 
the size of the LCA active set during convergence. Accom- 



then nodes in never cross threshold (i.e. enter T). 

The theorem is proven in Appendix [C] Conditions ( fl3j ) and 
( p"4"l ) involve complex relationships between several parame- 
ters. In order to get a better feel for the implications of this 
result on the system parameters, we study below the special 
case where the matrix $ is a random matrix with an RIP 
constant of the form (|3). 

Starting the system at rest: When the system starts at rest 
(u(0) = 0), condition ( fT3j ) becomes: 



< a 



(||at|| 2 + v T^!|ej| 2 + Av / ^). 



Since a > 1, this condition is always true and ( fT3] l holds when 
the system starts at rest. 

Condition on the sparsity: In the noise-free case, ( fl4"| ) 
gives a condition on the sparsity S of the underlying signal 
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a? for the left-hand side to be positive: 

i u-*)Vi 



1 



(15) 



As a reference, for an RIP of 6 < 1/2, we have a < 6, and 
for S < 0.1, we have a < 1.358. Using the e stimate in (|3} of 
the RIP constant for random matrices S ~ v/ S \og(N/ S) /M 
yields: 

Sy/\og(N/S) < ^\~ 5 } VM ~ VM. 

1 + 

This shows that the number of measurements must be on the 
order of 0(S 2 log (N/S)) when $ is random. 

Condition on the threshold A, noise-free case: If the 
threshold is set too high, then the solution to ([TJ is simply 
zero and the theorem is trivial. We would like to know if A 
can be set low enough to recover interesting solutions. In the 
noise-free case, condition ([14} becomes 

oeS II -hi 

a > = K L. 

Assuming that ( fT5j ) holds, the threshold must obey: 



A > 



1 

71 



,tl 



(16) 



The notation > means greater up to a constant factor. From 
(JTOj, recall that the solution a* is simply a thresholded version 
of a f , 
ar = a) - A | $r <f>r 



As a consequence, a threshold of the form in ([16} guarantees 
that no more than a constant portion of the energy in the 
original signal will be cut out in the solution a*. 

If some of the nodes have very small amplitudes, they do not 
contribute much to the signal energy and thresholding them out 
may be acceptable. It is instructive then to look at the scenario 
where all the non-zero entries in a) have the same magnitude. 
In this case, the energy is uniformly spread across the nodes 
and we would like the threshold to be low enough to recover 
them all. Without loss of generality, assume that ||a^|| 2 = 1 
so that each non-zero element of a) is equal to ±l/y/~S. For 
the solution a* to not be identically zero, the threshold must 
remain below l/\/S. Taking A = r/ \ /f S, for some 1 > r > 0, 
([14} yields: 

r 

—— > 



a 6 



VS ~ 1 - adVS 
After reorganizing the terms, we obtain: 

This is just slightly stronger than condition ( fT5} 
yields again a number of measurements on the order of 
0(S 2 log(N/S)) for an RIP constant as in ((3}. 

Condition on the threshold A, noisy case: Assume that 
the measurements are corrupted by Gaussian white noise e 
whose entries have variance a 2 . Then the norm ||e|| 2 of the 
noise vector is on the order of y/Ma, and || < l >T e|| is on the 

II Moo 

order of yf\ogN a with high probability. In this scenario, ( [14} 



requires that the threshold be greater than: 

(l - aoVs) X>aS+ (aSVl - S^M + y/\og iv) a. 

In order to keep a number of measurements comparable to the 
noise-free case, we look for conditions on the noise variance 
so that the new extra terms are on the same order as aS. We 
look for a constant k > such that: 

(aSVl - 5VM + V'log^V) cr = Ka5. (17) 

Assuming again a threshold of the form A = r/yS for 1 > 
r > yields: 

\K + l + r/ao 
which is slightly less than the sparsity allowed in the noise-free 
case, but also yields a number of measurements on the order 
of 0{S 2 \og{N/S)). Taking 5 - y/Slog(N/S)/M, S <C N, 
M ~ S 2 \og(N/S), a ~ 1, ay/1 — S ~ 1 and reorganizing 
the terms in ([17} yields a noise variance of: 

aSu 



S\og(N/S) 
M 



a5y/l - 5y/M + VIoglV 
k 1 



yJS\0g{N/S) + yffcg~N 
K 1 



1 



y/M 1 



log N 

Slog(N/S) 



As a consequence, the total energy allowed in the noise vector 
is on the order of ||e|| 2 ~ O {^ij (l + 1/y/S^, which means 
that the energy of the noise can be approximately of the same 
order as the energy of the signal. 

A sharper estimate: The result in Theorem [2] is stated for 
any fixed noise vector e. In the case where the noise e is 
assumed to be a Gaussian random vector, the proof of Lemma 



[T| in Appendix |b] hints that the bound used for 



7°° - ffltl 



can be improvedupon. The essential step consists in bounding 
the term || (<&p <&r) — 1( &r e || 2" ^ ' s an eas ^ calculation that 



E 



{ || (^r)- 1 ^!!'} =a 2 Trace ((^r)- 1 ) < 

Moreover, standard tail inequalities 1 19 1 show that this random 
variable concentrates around its mean as well, so when the 
noise is Gaussian, we can replace y/l^S ||e|| 2 by y/~Sa with 
high probability in ([14}. Going over the equations in the 
previous paragraph, we see that we obtain a noise variance 
of the form: 

aSn \ k 1 

-aSy/S + yfk^w) ~ 1 + vTogWv^' 
As a consequence, the total energy allowed in the noise vector 
is now on the order of ||e|| 2 - O L/WfS I '(l + yioglV)' 
and which can increase with the number of measurements M 



Comparison to digital solvers: In this study, we showed 
that the requirement on the sparsity for Theorem [2] to hold 

is y/S < 7-^ — - ^' S '^ 1 ? — ■ When $ is random, this leads 

(1 + ds + l)0s + l 

to a number of measurements on the order 0(S 2 \og(N/S)). 
This result strongly resembles the condition for the Homotopy 
algorithm to satisfy the 5-step property [2], which requires that 
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S<(1 



/2 and leads to the same number of measure- and the RIP constant S are satisfied: 



ments. For M ~ 0(S 2 log N), the Homotopy algorithm on 
the parameter A behaves like a pursuit algorithm, where nodes 
are added to the active support and the solution evolves in a 
piecewise-linear manner. Likewise, the LCA solution evolves 
according to a continuous switched linear system and nodes 
are added to the support until the solution is reached. Both 
results ensure that only nodes present in the final solution enter 
the active set. OMP was also shown to recover an S'-sparse 
signal in exactly S steps provided that <£> satisfies the RIP 
with y/S < 1/ (35 s+ i) (compare to < 1/(1.358 S s+ i) if 
S < 0.1 in (p3]>), which also leads to 0(S 2 log N) measure- 
ments p6) . Consequently, despite the continuous-time nature 
of the LCA trajectories, we get comparable bounds on the RIP 
constant as Homotopy and OMP algorithms. 

Decreasing the threshold A: Note that from the proof in 
Appendix [C] Theorem [2] holds if the inequality 



A > S 



,t 



ar k (t) I 



1 1 



is satisfied for all time t. Since the quantity 
is expected to decrease exponentially fast to its minimum 
value pO) , we can consider also decreasing the threshold A 
according to an exponential decay as the system evolves. A 
decrease in the threshold would allow the system to potentially 
recover more nodes from a\ while keeping the size of the 
active set below S, as well as yielding faster convergence. This 
is indeed observed in practice (see Section |IV) . Interestingly, 
similar observations have been made for digital solvers (e.g. 



in 1 30 1, the threshold is decreased according to a geometric 



progression to speed up recovery). However, there has been no 
analytic justification for the observed increase in speed or for 
how to choose the decay rate. In our case, even if the proof 
suggests the potential advantage of decreasing the threshold 
according to an exponential decay, the additional dynamics 
on the threshold would drastically change the nature of the 



analysis, starting with the proof of convergence in |10|. 



B. Bounding the size of the active set 

The main result of this section gives a condition on the 
threshold A to guarantee that the active set never contains 
more than q nodes throughout convergence. The dictionary 
$ is now assumed to satisfy the RIP of order (S + q), where 
S is the sparsity of the original signal a\ with RIP constant 5. 
In contrast to the previous result (Theorem[2]), active nodes are 
not restricted to be part of the final solution. In contrast to the 
analysis for digital solvers, we are not interested in bounding 
the number of "switches" before convergence. In our case, 
bounding the size of the active set is enough to still guarantee 
exponential convergence. 

Theorem 3: Assume that the dictionary $ satisfies the RIP 
with parameter (S + q, 6), for some q > 0. If the initial 
state u(0) does not contain more than q active nodes and the 
following two conditions between the original signal a\ the 
initial state u(Q), the threshold A, the noise e, the parameter q 



IK0)|| a < V« 



**™(ll''ll.+^w.). 



(18) 
(19) 



1-3*^9 

then the active set Y never contains more than q nodes for all 
time t. 

The theorem is proven in Appendix [D] As we will show in 
the discussion below, useful values for q are typically small 
multiples of S. Condition ( [T9] > involves again a complex rela- 
tionship between the various parameters. As for the previous 
theorem, we will interpret the results in the case where the 
RIP constant has the form in (B). We also assume that we 
have no initial guess and that the system is started at rest. 

Starting the system at rest: It is clear that when the system 
starts at rest (u(0) = 0), condition ( p"8j ) holds. 

Condition on the RIP constant 5: For the right-hand-side 
in ( fT9| ) to be positive, we need the RIP constant 8 of order 
(S + q) to satisfy: 



S< 



1 



In the case of a random matrix $ with RIP constant as in 
(|3j, ( fT9) > yields a number of measurements on the order of 
O ((S + q) \og(N/(S + £?))), which is close to the optimal 
number of measurements necessary for l\ -minimization to 
recover a sparse signal. 

Condition on the threshold A, noise-free case: As in 
the discussion for Theorem |2j we want to know if condition 
( [T9| ) allows for a threshold low enough to recover interesting 
solutions. In the noise-free case, this condition becomes 



A > 



the order of: 



A > 



1 + 6 1 

1-3^ ll2 ' 
For an RIP constant as in Q, the threshold can be chosen on 

^ l|at|1 - 

This guarantees again that a constant portion of the energy is 
recovered. Looking again at the scenario where all the non- 
zero entries in a> have the same magnitude (equal to 1/yS) 
and taking a threshold of the form A = r/y/S (with 1 > r > 
0), (19) yields: 

r 

—— > 



1 + 5 1 



VS - 1 - 3(5 ^q 
If in addition, we let q = f3S and rearrange the terms, we get 



S< 



1 + 3v^r ' 



(20) 



For the right-hand side to be positive, it suffices to have 



P > 



. If r is close to 1, then (3 > 1 and if r = 0.5, 



then f3 > 4. When the RIP constant is given by ([3]) the 
inequality obtained yields again a number of measurements 
on the order of O ((S + q) \og(N / (S + q))) with a slightly 
bigger overhead constant. 

Condition on the threshold A, noisy case: Assume again 
that the measurements are corrupted by a Gaussian white 
noise e, whose entries have variance a 2 and that 1 1 ex"'" 1 1 = 1. 
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Condition (jT9]» becomes: 
A> 1 + ~ S 1 



(l + Vl - dVMa^j . 



l-3S^q 

In order to keep a number of measurements on the same order 
as in the noise-free case, we look for conditions on the noise 
variance so that for some k > 0: 



\/l - SVM(T = K 



Reorganizing the terms yields a noise variance of: 

1 K 1 



As a consequence, the total energy allowed in the noise vector 
is on the order of ||e|| 2 ~ O (1), which is the same order as 
the energy of the signal. 

A sharper estimate: Here again, assuming a random 
Gaussian noise vector e in the proof of the theorem leads to a 
more accurate bound. Using the same concentration argument 
as previously, we can replace y/l^S ||e|| 2 by ^J~qa with high 



probability in ( fT9[ > when the noise is Gaussian. This yields a 
new noise variance of the form a ~ K/y/q and the energy 
in the noise vector becomes ||e|| 2 ~ O (\/ M/q^j which can 
again increase with the number of measurements. 

Comparison to digital solvers: Theorem|3]gives conditions 
for the size of the active set to remain bounded throughout 
the convergence of the LCA system. The study above shows 
that such conditions can be achieved in typical CS scenarios 
when the RIP constant of $ is a small constant. For instance, 
using (3 = 30 and r = 0.8 in Eq. <P20) yields S 31S < 0.23. 



In comparison, OMP has been shown to converge for <5. 



3 IS 



< 



1/3 [24]. The result for ROMP has a slightly worse form 
with Sss < 0.01/yiog S pij . Finally, CoSamp was shown to 
converge for S^s < 0.1 |26) . 

For all those algorithms, the RIP constants reported lead 
to the same order of measurements 0(5 log TV). This is 
another interesting parallel between the LCA and its digital 
equivalents. In all cases, letting more than the S nodes of 
the true solution enter the active support still leads to good 
convergence results, while yielding better scaling on the RIP 
constant and number of measurements. However, the proofs 
for the digital solvers show convergence to a solution close to 
the true solution in the £ 2 - or ^-norm (see |24) , p6) , pT| for 
the exact bounds). On the other hand, the conditions reported 
here for the LCA are only necessary to guarantee a bound on 
the exponential speed of convergence. As stated before, the 
algorithm is already guaranteed to converge to the solution 
to ([T| without any requirements on the RIP constant. Thus, 
the error achieved is linked to the performance guarantees 
associated with l\ -minimization (for instance 62s < V%— 1), 
as discussed in Section Hl-AI 

Decreasing the threshold A: As we did for the previous 
theorem, we again note that in the proof of Theorem [3] the 
value of A required to have no more than q nodes active 
depends only on the quantity \\u(t) — it*|| 2 , which is known 
to decrease exponentially fast pO) . This again suggests de- 
creasing the threshold A according to an exponential decay as 



the dynamical system evolves. Reducing the threshold could 
potentially lead to recovering more nodes from a\ while 
keeping the size of the active set below q, and yielding faster 
convergence. This is indeed what we observe in practice (see 
Section ITVl. 



C. Consequence of the Convergence speed 

The goal of the study in this paper is to obtain an estimate of 
the speed of convergence of the LCA algorithm in the context 
of CS recovery. In Theorem [2] we showed that under some 
conditions, the active sets visited during convergence may 
never contain more than the S optimal nodes. This result was 
generalized to allowing no more than q nodes to become active 
in Theorem[3] Such guarantees allow us to apply Theorem[T]to 
put a strong bound on the convergence speed. If the conditions 
of Theorem [2] are satisfied, then the constant d in ( fT2| ) can be 
approximated by the RIP constant of $ of order S, which 
is approximately \J S \og(N/S)/M for random matrices of 
interest. On the other hand, if the conditions of Theorem [3] are 
met, we can approximate d by the RIP constant of $ of order 
q, which is approximately \og{N/q)/M in the random 
matrix cases studied. These estimates can then be used in the 
expression for the speed of convergence: v — (1 — d) jr. This 
leads to an estimate for the convergence time of the LCA of 
1 



O 



where t is the time constant 



l-y/Slog(N/S)/M 
of the physical solver. 

For informational purposes, the digital solvers Homotopy, 
OMP, ROMP and CoSamp have been proven to have running 
times on the order of O(SMN) flops when the number of 
iterations is finite EH, [24l , (26), [3T[ . This estimate can in 
general be reduced if a fast multiply for $ and $ T is available. 
It is important to keep in mind that the time constant r for 
the LCA has the potential to be much smaller than the time to 
perform a single matrix multiply for a digital solver fTTJ. As 
a consequence, the scaling properties of the LCA seem more 
favorable for large problems. 



IV. Simulations 

In this section, we provide simulations that illustrate the 
previous theoretical results^] As an example, we use a sparse 
vector a) of length TV = 400 whose non-zero entries are 
generated by randomly selecting S = 5 indices and setting 
their amplitudes so that ||o^|| 2 = L Then, we take M = 200 
measurements by generating a Gaussian random matrix $ 
of size 200 x 400, with entries drawn independently from a 
normal distribution and columns normalized to have unit norm. 
We also add a Gaussian white noise with standard deviation 
a = 0.025 to the measurement so that y = + e. All the 
results of this section are obtained by simulating the ODEs 
Q on a digital computer. The algorithm is always started at 
rest with u(0) = 0. 

'Matlab code running the experiments in this section can be downloaded 
at http://users.ece.gatech.edu/abalavoine3/code/LCA_CS_exp.zip 
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Fig. 2. Percentage of the trials where no more than the S nodes from the 
optimal support r+ become active during convergence. The value 1 means 
that 100% of the trials satisfied this condition. 



Fig. 3. Ratio of the maximum number of active element q during convergence 
over the sparsity level S. For instance, a value of 10 in the color bar means 
that the biggest active set during convergence contains 105 active elements. 



A. Effect of the threshold on the size of the active set 

We first explore how the value of the threshold A affects the 
size of the active set during convergence, i.e. the maximum 
number of nodes that become active while the system is 
evolving. In Fig. [2] and [3] we vary the value of the threshold 
A and the sparsity level S. For each point on the figures, 
we simulate 100 random draws of a sparse vector a) and a 
measurement matrix $ and assume that no noise is present. 
In Fig. [2] we look at the percentage of the 100 trials where 
only nodes that are part of the optimal support become 
active. For large S (approximately S > 28), we observe 
that the transition phase for A follows a curve that looks 
like 1/yS as predicted in ( [To} . However, for small S, the 
estimate in ( fTo} appears qualitatively different. In this case, 
using the fact that S <C M we can instead approximate ( fl4j ) 



by A > y/S 



logN/S) 
M 



1+S 



which matches 



'lo g N/S 

~M 

the general behavior of the transition phase for small S. In 
Fig. [5] the color coding represents the ratio of the maximum 
number of active elements q during convergence over the 
sparsity level S. We can see from this plot that the phase 
transition follows a 1/yS behavior. Moreover, we observe 
that even for most values of the threshold A and sparsity level 
S, a relatively few number of elements become active during 
convergence (q is mostly contained between IS* and 105*). The 
results in Fig. [2] and Fig. [3] confirm the qualitative behavior of 
the bounds derived in Theorems [2] and [3] 

B. Decreasing the threshold during convergence 

We noted in Section |III-A and III-B that the proofs of 
both Theorems [2] and [3] suggest that decreasing the threshold 
according to an exponential decay as the system evolves 
could still guarantee that the active set remains bounded while 
yielding faster convergence. This is indeed what we observe 
in practice. To illustrate this fact, we first ran the LCA with a 
high threshold value of A = 0.3. As can be seen in the first row 
of Fig. |4j the active set never contains more than three nodes 
that are part of the optimal support, but the final solution is 
missing two nodes from the original signal. In the second row 
A is fixed to a low value of 0.08. The final solution recovers 




100 200 300 400 rf 



0.5 
- 



2 4 6 8 

number of time constants t 



100 200 300 
node number 



Fig. 4. This figure shows the number of active nodes (left column) and the 
fixed point a* reached by the LCA (right column), for different choices of 
the threshold. The red crosses represent the original signal a* and the blue 
rounds are the solutions a*. A fixed threshold A = 0.3 was used in the first 
row, A = 0.08 in the second row, and the threshold was decreased from 0.3 
to 0.08 according to an exponential decay in the third row. 



all the nodes from a) . However, the biggest active set now 
contains q = 7 nodes and the convergence is slower. Finally, 
in the last row the threshold is started at 0.3 and decreased 
to the value 0.08 according to an exponential decay. The final 
solution is the same as the one in row 2 but the active set 
in this case never contains more than the five nodes from the 
optimal support. Moreover, the system converges faster, in less 
than two time constants compared to three time constants in 
row 2. 



C. Estimate of the convergence speed 

Finally, we would like to know how well the quantity 

e-^ 1 - 5 ^, (21) 

predicted by Theorem [TJ bounds the convergence rate of the 
solver, represented by the mean-squared error between the 
nodes at time t and the final solution u*: 



\u(t) 



2 ■ 
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In Fig. [5] this quantity is normalized to start at 1 so as to 
compare it with ( |2T| . When they are not varying, we fix the 
threshold A = 0.1, the number of measurements M = 200, 
the sparsity 5 = 5, and the signal length N = 400. For each 
experimental curve (solid lines), the mean-squared error is 
averaged over 100 trials. We also plot the theoretical decay 
(dashed lines) using the expression ( |2"T] i with 



Slog(N/S) 



M 



(22) 



As expected, the theoretical curves approximate the decay 
of the experimental mean-squared error. Note that they are 
not a strict upper bounds since we are only able to use an 
estimate for the RIP constant S. However, this allows us to 
check that the experimental curves qualitatively follow the 
theoretical predictions as the parameters N, M or 5 are varied 
in Fig. |5(a)| |5(c)| and |5(b)| respectively. 



In Fig. 5(d) we explore the effect of the threshold A on the 



experimental decay. Indeed, as A becomes smaller, more nodes 
are able to enter the active set. As a consequence, the estimate 
for 5 used is (J22j» changes to S = yj q\og{N / q) / M where q is 
the maximum number of active elements during convergence. 
However, for values of A bigger than 0.06, the bound with S 
in (|22~| (corresponding to the dark blue dashed line) remains 



valid, even though more than 5 nodes may become active (for 
A = 0.06, the average over 100 trials for the maximum size 
for the active is q — 23 = 4. 65). In addition, the theoretical 
decay with S — \J 55 \og(N/ 5) /M (yellow dashed line) is 
an upper bound even for very small values of the threshold 
A, where much more than 55 nodes become active throughout 
convergence (the average over 100 trials for the maximum size 
of the active set for A = 0.02 is 180 = 365). 

V. Conclusions 

In this paper, we studied a dynamical system for solving l\- 
minimization problems in the context of CS signal recovery. 
In this specific problem setting we are able to give strong 
guarantees about the path followed by the system's internal 
state variables during convergence. Indeed, our results show 
that in typical CS situations, the path followed by the LCA is 
close to optimal, with only a few nodes entering the active 
set during convergence. These results can then be used to 
make strong guarantees on the exponential convergence speed 
of the system, and the quantitative results generally agree 
qualitatively with our simulation results. Interestingly, despite 
the LCA being a completely different computing architecture 
than traditional algorithms being run on a digital computer, 
the conditions of our results directly parallel the established 
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guarantees for several digital algorithms. As with any signal 
processing system, such performance guarantees are important 
to establish before investing significant resources in system 
development and deploying the system in an application. 
The strong performance guarantees of this paper lead us to 
conclude that the LCA, if implemented in a large-scale analog 
circuit, could lead to substantial improvements in the time 
required for CS signal recovery in many problems of interest. 

Appendix A 
Ordinary Differential Equations 

In this section, we give a very brief overview of some 
fundamental results in linear ODEs, which we use freely in 
our proofs. Let x(t) be a function from R + to K. , A be a 
diagonalizable matrix of size N x N and b be a vector in M. N . 
Consider the following ODE: 



x(t) = Ax(t) + b. 
The solution to (f23| with initial point xitk) 



(23) 



x " is: 



x{t) = e Mt -^x tk + (l- e^ 4 "**)) A~ l b. 

Note that the above expression (I — e At } A^ 1 is always 
well-defined even when the matrix A is singular. To see this, 
first diagonalize the matrix as A = PAP -1 , where A is a diag- 
onal matrix with diagonal elements A^: A = diag (Ai, . . . , A„). 
Plugging this in the above expression yields: 



(I -e At ) A' 1 =P(I- 



A- X P- J 



Pdia 



iag((l- e ^)A r 1 ,...,(l- e A " t ) A-^P" 1 

When the eigenvalue Ai is non-zero, its inverse exists and the 
term in the above matrix is well defined. To see that the terms 
still make sense in the case where Ai is equal to zero, we first 
take a Taylor expansion when Ai goes to zero: 

A -1 (1 - e A **) = A -1 (-Ai* + o(A?)) =-t+ o(\ t ). 

By continuity, we get that when Ai = 0, then (l — e Ait ) A^ 1 = 
—t and thus, the matrix (i — A^ 1 ^ A^ 1 is well defined. 
In the case where b in d23l varies with time, the solution to 



23) with initial point x(tk 
x(t) 



e A{t-t k ) x t h +e At / & -Ar 
Jt k 



b(v)dv. 



(24) 



Appendix B 
Lemmas 

In the proofs of Theorems [2] and [3] we will need the 
following two lemmas. The first result bounds the £ 2 -Viorm 
of the distance between some points of interest and the true 
signal a\ 

Lemma 1: Let a°° be a vector supported on a set T that 
contains less than p indices and that satisfies: 

$£$ r a°° = Az r , 

where zy = sign(a^). Let R = \TU r+| be the number of 
elements in the support of (a°° — <v). If $ satisfies the RIP 



with parameters (R, S), then the following holds: 

\\a°°-^\\ 2 < (l-S)- 1 (||ot|| a + vT^I||e|| 2 + A^) . 

S v ' 

= (l-5)(l+5)- 1 C i (p) 

Proof: We start by noting that since $ satisfies the RIP 



< 5 



andp<P, then ($r<Pr) < (l-tf) -1 . ||$r$r t nr c | 
as a submatrix of $ T <1> — / with at most S < R columns, and 

2 ~~ 



<(i-sy 



Splitting a' into its component on T and r c , we notice that: 
a\, = ("^r^r) 1 ^r^ra]^ 



We use these facts to finish the proof: 



< 



$P<i>r) {^rU — Az r ) - a 
($^$r) _1 ($r ( $flt + e ) - Xz r) ~ 4 
(<i>f $r) 1 3>r$r<=ar<= - a r^ 

($^$ r ) _1 e - A ($^$ r ) _1 z v 



(*r*r) 



| < i ) r < ^r t nr<= 



||e|| 2 + A ($?<Pr 



|zr|| 2 



+ ^l~8 ||e|| 2 + A(l- ( 5)- 1 Vp 
< (||at|| 2 + Vr^||e|| 2 + Av5)- ■ 

■ 

The second result states that the £2 distance of the output 
states to the true signal remains bounded for all time t. 

Lemma 2: Assume that at switching time tk, the current 
active set Tk contains less than p indices, that $ satisfies the 
RIP with parameters (R, S), where R — \T k U T t |, and that 
a(ifc) satisfies: 

\\a(tk)-a^\\ 2 <C s (p). 
Then, for all t € [tk,tk+i], a(t) also satisfies 
\\a(t)-a%<C s (p). 

Proof: Define of? by $f fc $ rfc aP fe = $f fc y - Xz Tk . We 
apply Lemma [l] to obtain that ||af? — a ^|| 2 < (1 — + 
5)~ 1 Cs(p). Using the dynamics from (j8), we have that for 

t e [tk, tk+ij- 

|| a W - at || 2 = ll a r fe («) -a f || 2 

- A(t - tk) ar k (t k ) + (l-e-^*-**)) 

-^-**>(a rfc (t fc )-at) 



< 



+ 



,-A(t-t k ) 



(a 



< e - C i-*)(*-*».) ||a rfc (* fc ) - at| 
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+ ( 1 - e ~ (1+5)( *~ tfc) ) IK-^L 

< e -a-«)(*-**)c,(p) + fl - e-d+^X*-^)) l^C 5 (p) 

V / 1 + o 

(0 

< C,(p). 

To prove the last inequality, we study the function: 

h(t) = (l - e^ 1 -^) C - (l - e-^)') ^jc. 

We take the derivative of /i(i): 

ft'(t) = (1 - S) e~ {1 - 5)t C -(1-8) e-^ 1+5)t C 
= (1 - S)C{e-^* - e -< 1+ «') >0. 

Notice that h(Q) = 0, and for all t > 0, ft'(t) > 0, so for all 
t > 0, we have /i(t) > 0, and the inequality (i) holds. 

Finally, since the vector a(t) — is continuous with time: 

||ar fe+1 (ifc+i) - a^ll = ||ar fc (**+i) - ^ c s(p)- ■ 



Appendix C 
Proof of Theorem|2] 

Proof: To prove that the active set F is a subset of T-f for 
all time t, we show by induction that for all switching times 
t k , for all time t E (tfc,£fc+i), we have for all j E rf: 

K-(i)|<A. (25) 

If this condition is satisfied, then no node in TS will cross 
threshold and the next active set T^+i will remain a subset of 
r+. We will also need the following induction hypothesis, for 

all t E (t k ,t k +i): 



|or»(t)-ot|L < C S (S). 



(26) 



By the theorem hypotheses, the initial active set is a subset 
of F|, so ( p5| holds, and |26} holds. We now assume that 
the induction hypotheses hold for a particular switching time 
t k . If there is no more switching after t k , then we are done. 
Otherwise, using the dynamics from we know that, for all 
j E C T c k , we have Vf E [t k ,t k+1 ]: 

rt 



Uj(t) 



e 



e v pj{v)dv, 



with pj(v) = $j (y — $r fc ar fc (^)) ■ We bound the absolute 
value of the expression above using: 

ft 



1 



e v pj{y)dv 



< e 



-(*-**) |„*fc| 



<e-(*-**MuJ*l + (l-e-(*-^ 



sup |ft(f')l- 

^'e[*fc,tfc+i] 

Since at time ffc, node j is inactive, we have: \uj | < A. As a 
consequence, condition ( |25] l is satisfied if: 

sup \ Pj (v')\<\. (27) 

v'e.[tk,tk+i] 



We will use the fact that the matrix Qj^r^ is a submatrix of 
<£> T <I> - 7 with (5 + 1) distinct columns and apply the RIP of 
order (S + 1): ||$J$r t || < 6. Then, we have that for all time 
t E [t k ,t k+ i], for all nodes j E T±: 



\pM = \*J(v 



= M ( $ r t « 



$ r»o r »(t))| 



|(j) T $ r 



(ot-or fc (t))+$Te| 



(y = $r t a t + e) 
(since r fe C T t ) 



< |$J$ rt (at-ar fc (*))| + |$Je 

< l|$J$r t || ||at-a r »(i)|| a + 



1 1 



ar fc (*)| 



1 1 



We apply Lemma [2] to get a bound that holds uniformly across 



time, for t E [t k , t k + 1] 



a(f) L < Cs(S). In particular 




-a || 2 — Cs{S) and the induction hypothesis 
( |26| > remains true at time t k +x- 

Putting the pieces together and using condition (14) , we 
have for all time t E [t k ,t k+ i] and for all nodes j € T£: 



4> 



r c£ 



< A(l -aSVS + aSVS 



A 



This shows that $25\ holds for all time t E [t k ,t k+ i], which 
ends the proof by induction of the theorem. ■ 

Appendix D 
Proof of Theorem[3] 

Proof: We want to show that for all time during con- 
vergence, no more than q nodes are active at once. First, 
we introduce some notations and denote by A(t) the set 
containing the q biggest nodes in u(t) at time t. This set 
depends on time, but we will often remove the dependence 
in the notation for readability. We will prove the theorem by 
showing that for all time t, nodes in A c (t) cannot be above 
threshold, so that only the q biggest nodes can be active. 
By definition of A(t), for any node j in A c (<), we have 

< ||uA(t)(i)|| 2 /v / 9- ^ s a consequence. we prove the 
theorem by induction, by showing that for all switching times 
t k , for t E [t k ,t k+ i], we have that 



|| w A(t)(*)|| 2 ^ A \/<7 
We will also need the following induction hypothesis: 

\\a(t)-a%<C- g (q) 



(28) 



(29) 



By hypothesis, the initial state has less than q nodes active 
and d28|i is true at time t = 0. Moreover: 



\a(0) 



-,t| 



< 



< 



IK0)|| 2 + a? L <||u(0)|| 2 + 



,t| 



AV9- 



,t| 



< cm, 



and ([29J also holds at t = 0. 

We now assume that for some switching time t k , the active 
set T k has less than q nodes and ( |28) and ( p9) hold for 
t E [t k -i,t k ]. If there is no more switching, we are done. 
Otherwise, note that since nodes in the active set satisfy 
\u n \ > A, while nodes in the inactive set satisfy \u n \ < A, 
and since the active set T k contains less than q nodes, we 
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are guaranteed that Tk C A(t) for all t € [tk,tk+i]- The 
nodes dynamics on A(t) (denoted by A in the remaining for 
readability) for t G {t k -t k +i) i s: 



UA{t) = e 



UA 



e v pA{v)dv, 



where pa{v) = cla{v) — <J>^<Pa(V) + <P^y. We can bound the 
^2-norm of this quantity as follows: 



|«a(*)|| 2 < e-'||uA(*fc)|| 



+ e 



sup ||pa(^')II 2 dv 
v'et k ,t k+1 



< e 



-(t-t k ) 



|«a(*jOII 2 + (i 



-(t-t k ) 



sup 



Pa{v )\\ 2 ■ 
(30) 



By the induction hypothesis, we know that 



|ma(^)II 



< 



|«A(t fc )(*fc)|| 2 < Wl- 



We now find a bound for all t G (ifc,ifc+i) for: 

||pA(*)|| a = IM<) - $A*aW + t>ly\\ 2 

^ + (a A (t) - a A ) + <e A e 



I- 



*I*r t uA| 



\a(t) 



I ^A fc || 2 



Since A has less than q indices we can apply the RIP of order 
S + p to the matrices I— Q^^r^uA and $a- Moreover, since 
( |29| l holds at time t/., we can apply Lemma [2] which proves 
that the induction condition (|29]l is true at time t k+ i and gives 



a uniform bound on the quantity \\a(t) 



-nil 



We obtain: 



||PA(*)|| a < ||at|L+&7,( ? ) + (l + 5)|| e || 



= (l + 6{l + S)(l-S)- 2 ) || a t|[ a 

+ (1 + 5) (S(l - 5y 2 Vl^l+ l) ||e|| 2 
+ S(1 + S)(1- ~5)-' 2 \^q 
< (1 + 5)(1-S)- 2 (||at|| 2 + V^I||c|| 2 + lAVg) . 
Applying the theorem hypothesis ( |19) , we get 

||pA(*)|| a < (1 ~ ST 2 (1 - 3* + J(l + 5)) A^ - A^. 



Plugging this back into < |30j >, we proved that for all t € 
[ifc,tfe+i], we have ||uA(i)|l2 < ^a/^ an( ^ m particular, we 
proved that the induction condition ( |28| ) holds, which finishes 
the proof. ■ 
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